Developmental dynamics of chromatin accessibility during post-implantation development of monkey embryos

Abstract Background Early post-implantation development, especially gastrulation in primates, is accompanied by extensive drastic chromatin reorganization, which remains largely elusive. Results To delineate the global chromatin landscape and understand the molecular dynamics during this period, a single-cell assay for transposase accessible chromatin sequencing (scATAC-seq) was applied to in vitro cultured cynomolgus monkey (Macaca fascicularis, hereafter referred to as monkey) embryos to investigate the chromatin status. First, we delineated the cis-regulatory interactions and identified the regulatory networks and critical transcription factors involved in the epiblast (EPI), hypoblast, and trophectoderm/trophoblast (TE) lineage specification. Second, we observed that the chromatin opening of some genome regions preceded the gene expression during EPI and trophoblast specification. Third, we identified the opposing roles of FGF and BMP signaling in pluripotency regulation during EPI specification. Finally, we revealed the similarity between EPI and TE in gene expression profiles and demonstrated that PATZ1 and NR2F2 were involved in EPI and trophoblast specification during monkey post-implantation development. Conclusions Our findings provide a useful resource and insights into dissecting the transcriptional regulatory machinery during primate post-implantation development.

Recently, advancements in the study of in vitro-cultured embryos have enabled our and other groups to investigate transcriptional and DNA methylation dynamics during early embryonic development of humans and monkeys [3][4][5][6][7]. However, several key questions, including the chromatin status that underlies this transition, have yet to be addressed.
In the mouse, chromatin accessibility, histone modifications and 3D chromatin structures during post-implantation development have been extensively studied and epigenetic regulatory networks have been revealed [8][9][10][11][12][13]. As the large differences exist between primates and mice in terms of post-implantation development, for example, in the morphogenesis of embryonic and extra-embryonic structures, and signaling pathways involved in the specification of embryonic and extra-embryonic lineages [1,2,14,15], the knowledge derived from mouse models could not be straightforwardly extrapolated to primate models. This poses a significant limitation to studies of, for example, the regulation of pluripotent stem cells (PSCs) in primates.
Here, we harness the power of single-cell assay for transposase accessible chromatin sequencing (scATAC-seq) and in vitro culture system to unravel the regulatory chromatin landscape during early embryonic development in cynomolgus monkeys (Macaca fascicularis), as early post-implantation development is similar between the monkey and the human and global ethical considerations exist in humans (14-day rules). This study provides new resources to study chromatin dynamics and chromatin regulation during early embryonic developments in primates.

Single-cell chromatin accessibility profiles of early embryogenesis in the cynomolgus monkeys
To determine the regulatory landscape at a single-cell resolution during cynomolgus monkey peri-and post-implantation development, we performed scATAC-seq of in vitro-cultured cynomolgus monkey embryos from post-fertilization day 9 (d.p.f. 9) to day 20 (d.p.f. 20) as previously reported by our group and single cell RNA-seq data were used for further analysis [5] (Fig. 1A and Supplementary Fig. S1A). In total, 1,198 individual cells were sequenced, and after applying stringent filtration (usable fragments > 10,000, promoter fragments ratio > 10%), 978 high-quality single nuclei, which distributed from d.p.f. 9 and d.p.f. 20, were retained (Supplementary Table S1 and Supplementary Fig. S1A). Cells within per embryo passing filter had the median fragments ranged from 20,020 to 61,182, the median fraction of fragments in promoters (500 bp around transcriptional start site) ranged from 12.14% to 19.01% and the median fraction of fragments in peaks ranged from 51.37% to 64.39 % ( Supplementary Fig. S1B).
Based on these high-quality data, we investigated global gene regulatory activities during cynomolgus monkey early development. First, the resting 978 cells were dimensionally reduced using UMAP and clustering analysis. In scRNA-seq analysis, four main cell clusters, namely epiblast (EPI), trophectoderm/trophoblast (TE), yolk-sac or visceral endoderm (VE/YE), and extra-embryonic mesenchyme cell (EXMC) were identified (Fig. 1B). In addition, coembedding analysis of scRNA-seq and scATAC-seq data revealed a consistent overlap of all cell types ( Supplementary   Fig. S1C), suggesting the power of scATAC-seq in cell identity identification. Next, increased and unique ATAC-seq peaks in promoter and distal regions of cluster-specific marker genes were identified, including OCT4 locus (also known as POU5F1) in the EPI, TFAP2C in the TE, HNF1B in the VE/YE, and TCF 21 in the EXMCs (Fig. 1C). Furthermore, the binding motifs of these marker genes were also enriched in the four major cell clusters, which was accompanied by cluster-specific expressions of the marker genes ( Supplementary Fig. S1D).
We related the cluster-specific differentially peaks (DPs) to the differentially expressed genes (DEGs) (Supplementary Table S2 and Fig. 1D). Furthermore, the enrichment of well-known lineage specific TFs binding motifs was observed for cluster specific DPs, such as OCT4 (POU5F1) and NANOG in the EPI, TFAP2C and 8 TEAD4 in the TE, GATA4 in the VE/YE, and TCF21 and FOXF1 in EXMCs (Fig.   1E). After confirming the above observations, the most enriched Gene Ontology (GO) terms for DPs related to DEGs in the EPI included early-development associated terms such as anterior/posterior pattern specification and embryo development, and the most enriched GO terms in the VE/YE included epithelium development and regulation of WNT signaling. According to the cell identity of EXMCs, mesenchyme development associated terms were enriched in this cell lineage. Interestingly, we observed inflammatory response, regulation of immune system processes and other GO terms enriched in the TE cells, suggesting their role in immune regulation during pregnancy [16] (Fig. 1F). Taken together, these findings indicate that combining embryo in vitro culture platform with powerful single cell chromatin assays can successfully generate comprehensive and high-quality maps of open chromatin corresponding to major cell lineages and lineage regulators during monkey early embryogenesis.

Lineage specific transcriptional regulatory network of early embryonic development in monkey
To further characterize the transcriptional regulatory networks of early embryonic development in the monkey, we identified lineage specific TFs and their enriched motifs as well as lineage specific DPs. In addition, the gene activity scores and expression levels of the TF target genes were analyzed. Based on the above analysis, we constructed a TFs data resource (Supplementary Table S3), which may play important roles in main cell lineage specification during monkey early embryogenesis.
The top ten lineage specific TFs are shown in Fig. 2A, and among them several well-studied cell lineage markers were identified, including POU5F1 (OCT4), HNF4A, CREB3L1 and GCM1 ( Fig. 2A and b). Next, leveraging identified lineage specific TFs and their target genes, we constructed modules of lineage specific TFs and regulatory networks of target genes that were putatively co-regulated by two lineage specific TF modules (EPI-VE/YE, EXMC-VE/YE, EXMC-TE and TE-VE/YE). EPI and VE/YE lineages were highly related in this network by hub TFs, such as FOXH1, indicating their similarities in the regulatory program of early embryogenesis, while TE and EXMC lineages were distinct from the other lineages ( Fig. 2C). The top five GO enrichment terms of target genes are shown in Fig. 2D.
Consistent with the transcriptional regulatory network analysis, we observed that the chromatin regions of co-regulated genes were more accessible in both lineage specific TF modules and the expression of genes displayed a lineage specific pattern, suggesting that besides chromatin accessibility, additional mechanisms exist to guarantee lineage specification during early embryogenesis such as FOXH1 target genes in the EPI and VE/YE (Fig. 2E), as these genes are involved in WNT signaling pathway in the EPI and in VE/YE (Fig. 2F).

Single cell chromatin accessibility reveals regulatory mechanisms of EPI lineage
To explore the regulatory mechanisms underpinnings the identity of EPI cells, we integrated the scRNA-seq and scATAC-seq datasets to query the relationship and the congruence between chromatin accessibility and gene expression using Seurat [17].
The coembedded UMAP plot identified four cell clusters, designate as EPI-A, EPI-B, EPI-C and gastrulating cell (Gast)) ( Fig. 3A). We also observed that most of cells in the scRNA-seq and scATAC-seq datasets overlapped with each other, suggesting that the chromatin accessibility and gene expression in most EPI cells occur in a concordant manner during early embryonic development (Fig. 3A). Next, the developmental trajectories of EPI-A, -B, -C and Gast cells were constructed. We observed that all cells were ordered in a U-like trajectory, with EPI-A cells occupying To delineate the regulatory role of chromatin accessibility in EPI cell lineage progression, we examined the cell identity of specific DPs from EPI-A to Gast cells.
During the transition from EPI-A to B and EPI-C to Gast, most gene regions tended to gain accessibility, whereas they lost accessibility during the transition from EPI-B to C (Supplementary Fig. S2B and Supplementary Table S4). Notably, Hox gene activation was observed when the EPI cells underwent gastrulation, indicating their important roles in primitive streak formation ( Supplementary Fig. S2C). Interestingly, we observed that the activation of chromatin states preceded the expression of genes ( Supplementary Fig. S2D), which implies that there is a time lag between gene expression and chromatin accessibility during EPI cell specification. GO terms and TF binding motif enrichment analyses of DPs indicated that OTC4 and SOX2 may be involve in the transition from EPI-A to B and early post-implantation stage EPI (EPI-C) to gastrulating cells (Gast) (Supplementary Fig. S2D and 2E).
To further study the relationship between chromatin accessibility and gene expression, we related the DPs to the DEGs and two patterns of genes expression and chromatin accessibility were observed. In the first pattern, the opening of chromatin regions preceded the gene expression (designated as the preceding manner), and in the second pattern, the opening of chromatin regions occurred concurrently with the gene expression (designated as the synchronized manner) ( Fig. 3C and D). In the preceding manner, four sub-patterns (clusters 1-4) of gene expression and chromatin accessibility were identified and GO analysis was performed. For example, pluripotency related TFs, such as POU5F1, SOX2, and NANOG, belong to cluster 2, in which chromatin opens at pre-and peri-implantation stages (EPI-A and EPI-B, early-stage EPI), and gene expression was up-regulated until the post-implantation stage (EPI-C) (Fig. 3C), which was also observed during the establishment of mouse PSCs in different pluripotent states [18]. In the synchronized manner, five clusters of chromatin accessibility and gene expression patterns were observed. Interestingly, genes involved in mesoderm formation and gastrulation belonged to the synchronized manner (Fig. 3D).
As the little is known about cell identity and pluripotency regulation of post-implantation EPI cells, we next focused our analysis on EPI-C and Gast cells.
We identified EPI-C and Gast specific TFs (Supplementary Table S5), and the chromatin accessibility and gene expression of TF target genes were determined. We observed that chromatin became accessible preceding the gene expression, indicative of an initial priming process of post-implantation EPI cells prior to their commitment to the specific lineage (Fig. 3E). We further investigated mechanisms leading to the pluripotency transition from early-stage EPI to Gast. We observed that the 14 expressions of core pluripotent factors, such as OCT4 and NANOG, was upregulated in EPI-C cells, which highly correlated with the expression of FGF signaling members such as FGF2, FGF4 and FGF receptor 1 (FGFR1) (Fig. 3F). Furthermore, the expressions of gastrulation marker genes, including TBX3 and CDH2, was highly correlated with BMP and WNT signaling. Neither FGF nor BMP signaling could regulate the expressions of naïve pluripotency related genes [19] ( [20,21]; TE-B cells were defined as early stage cytotrophoblasts (CTs), as they expressed ITGA6 [22], and TE-C were defined as the proliferative CTs, as they expressed the mature CT marker KRT7 (CK7) [22] and the CT markers GATA3 [22] (Fig. 4B). Finally, TE-D were defined as EVTs, as they expressed the EVT markers, ITGA5 [22] and FN1 [23,24] ( Taken together, these findings imply that chromatin is primed for lineage specification in trophoblast progenitor cells and gradually "closed", which is accompanied by the further differentiation of progenitor to mature cell types. Previous study reported that domains of regulatory chromatin (DORCs) are enriched for lineage-determining genes and can be used to infer cell fate choices de novo [26]. To delineate the cis-regulatory programs during trophoblast specification, we defined 941 DORC genes during trophoblast specification based on previously reported criteria (regions with >10 of significant peak-gene associations) [26]. indicating that another mechanism is involved in progenitor cell fate determination.
Finally, GO term enrichment analysis of DORC genes during lineage specification was performed and the top ten GO terms are shown (Fig. 4H).

Lineage segregation between the epiblast and trophoblast
Previous studies indicated that lineage flexibility exists between naïve PSCs and TEs [29,30]; however, the underlying regulatory mechanism remains elusive. We deciphered the regulatory events during EPI and trophoblast lineage specification.
The correlation of gene expression patterns between EPI and trophoblast cells in this study and previously published dataset [5,31] were calculated. We found that early Moreover, the chromatin accessibility profile also displayed a highly correlation coefficient between early-stage EPI (EPI-A) and TE (TE-A) cells ( Supplementary Fig.   S4B). These data suggest that there is a high degree of similarity between early-stage EPI and TE at the transcriptional regulatory level.
To identify the key gene set involved in EPI and TE lineage specification, we explored a set of genes with expression differences gradually increasing between EPI and TE cells from peri to post-implantation transition in this and previously published dataset, respectively [5,31] (Fig. 5B). Next, we assessed the overlap between these two sets of genes and 220 genes (designated as EPI-trophoblast lineage driving genes, E-T driving genes) were identified (Fig. 5C, Supplementary Table S8). To weight the importance of the 220 genes in EPI and trophoblast lineage segregation, the absolute value of the log fold change of the averaged 220 E-T driving genes between the EPI and trophoblast were calculated at peri and post-implantation, respectively (E-T expression difference). The same method was used to calculate the gene activity scores (E-T gene activity difference). Finally, a log fold changes of the E-T expression differences and the E-T activity differences of the 220 genes were calculated. GNAO1 in the EPI and INSL4 in the trophoblast were identified (Fig. 5D). GO term enrichment analysis showed that genes differentially expressed in the EPI were enriched in the regulation of neuron differentiation, which was also observed in early post-implantation EPI cells in vivo [31]. Meanwhile, genes specifically expressed in trophoblast cells were enriched in placenta development and other terms (Fig. 5E). To identify TFs that are critical for EPI and trophoblast lineage segregation, TF motif enrichment analysis for the peaks linked to the 220 genes was conducted and the expression of TFs was also detected ( Supplementary Fig. S4C). Representative TFs that potentially regulate EPI and trophoblast lineage identity are shown in Fig. 5F. To evaluate the weight of TFs in deriving the lineage specification of the EPI and trophoblast, we devised an approach to calculate the driving potential of TFs, which putatively bound to the 220 E-T driving genes (see Methods). We observed clear ordering of ZFNs and PATZ1 on the top of the list of TFs that putatively specify the EPI lineage and NR2F2 on the top of the list of TFs that are involved in trophoblast lineage determination (Fig. 5G), a role of that has been studied in humans [32]. Thus, the driving potential allows us to identify important TFs that play important roles in lineage specification.

Discussion
After implantation, mammalian embryos undergo dramatic lineage diversification and determination, and a multi-faceted regulatory process is involved to guarantee and achieve this cellular and molecular transition [33][34][35]. However, this multi-faceted regulatory process, especially the epigenetic mechanism, remains unsolved in primate post-implantation development. Using the scATAC-seq approach, we delineated the chromatin accessibility landscape of in vitro cultured cynomolgus monkey embryos.
Although the development of in vitro cultured embryos was slower than that of in vivo embryos, this study provides new insights on the transcriptional regulation of primate post-implantation development. Here, a low-throughput manual method was used, and some cell lineages may have been entirely missed. In the future, high-throughput methods and spatial omics sequencing methods will provide more information on primate early post-implantation development.
In integrative scRNA-seq and scATAC-seq analysis, we observed EPI cells in the post-implantation stage, which included early and gastrulating cells that were primed at the chromatin level just prior to final lineage commitment. This finding indicates that EPI cells are in a permissive state after implantation and ready for rapid differentiation into distinct cell types, which is important for gastrulation stage.
Lineage priming was also observed during early differentiation of CTs to mature CTs or EVTs, but not observed during specification of TE cells to early CTs. This observation is inconsistent with that of human hematopoietic stem cells and mouse skin cells [26,36], implying that in addition to chromatin accessibility, another mechanism may underlie progenitor specification in monkey trophoblasts. Based on the above findings, we speculate that chromatin is primed prior to cell fate determination in cells requiring rapid specification.
Naïve PSCs have been reported to possess trophoblast differentiation capability [29,30,[37][38][39][40]. Is this a conserved cellular mechanism in PSCs or does it only exists in cultured PSCs, and which TFs networks and signalling pathways are involved in this process? These questions remain unanswered. In this study, we identified similarities between early stage EPI and TE cells in terms of gene expression and chromatin accessibility. Furthermore, after leveraging the 'driving potential' calculation, we identified a group of TFs that may regulate trophoblast differentiation of naïve PSCs.
Among them, NR2F2 has been confirmed as a marker of trophectoderm maturation 23 [32]. Thus, our study partially answers the above questions. Further studies that combines lineage tracing and genome editing will help us to understand early cell identity and plasticity.
Taken together, our findings not only help us to understand the transcriptional regulation of primates early post-implantation development but also provide valuable resources for regenerative medicine.

Animals
Healthy cynomolgus monkeys (Macaca fascicularis) of 5 to 12 years old were used in this study. Monkeys were usually housed in groups, while during superovulation, oocyte collection they were provisionally caged individually at 16-26℃ under 40% to 70% relative humidity and a 08:00 to 20:00 light vs. dark photoperiod, fed of commercial pelleted food and water ad libitum. Vaginal bleeding was observed twice per day to detect the onset of menses. The beginning of bleeding wad defined as the 1st day of menstruation.

Embryo culture and single cell collection
After washing with phosphate buffered saline (PBS) (MA0008, Meilunbio), the embryos were cut into several pieces with a 1-mL syringe and digested into single cells with 0.1% trypsin (25200-072, GIBCO) at 37℃ for 3-5 minutes. After neutralization with 2% FBS (04-002-1A, Biological Industries), the cells were washed with ice-cold PBS containing 0.1-1% BSA. Finally, the individual cells were picked into ice-cold lysis buffer on ice with a mouth pipette for single cell ATAC libraries construction as previously described [5] .

Pre-processing of single-cell RNA-sequencing (scRNA-seq) data
For the pre-processing of raw sequencing data, we removed adapters and filtered out low-quality reads with an N rate > 0. 2

Cell clustering and uniform manifold approximation and projection (UMAP) projection
We selected the top 2,000 variable genes based on log-transformed TPM matrices using Seurat (v3.2.2) [17]. Principal components analysis (PCA) was performed, and the first 30 principal components (PCs) were used to build an SNN graph using the "FindNeighbors" function in the R package Seurat. We used UMAP [44] to visualize the distance between the cells on a two-dimensional map. Harmony (v1.0) [45] was used to remove the batch effects between the embryos. Cell clustering was performed using the "FindClusters" function in Seurat and cell-types annotations based on known cell-type specific marker genes. We identified marker genes between the clusters using "FindAllMarkers" function in Seurat (p < 0.05). The top 2,000 differentially expressed genes were selected to construct the trajectory model using Monocle (v2.18.0) [46,47].

Comparisons between EPI cells in vitro and in vivo
We extracted the overlapping genes in EPI cells between our and in vivo embryos [31].
The "IntegrateData" function in Seurat was applied to our and in vivo datasets and a UMAP graph was constructed to visualize the similarities between in vitro and in vivo embryos. Next, we averaged the gene expression values of each cell cluster in both datasets and calculated the Pearson correlations between the cell clusters in both datasets. The Euclidean distance between the cell clusters was calculated and unsupervised hierarchical clustering was performed to determine gene expression pattern similarities between the in vitro and in vivo cell clusters.

scATAC-seq dataset analysis
Signac (v0.2.5) [51] was used to analyze the processed scATAC-seq dataset. We used latent semantic indexing (LSI) and UMAP to reduce the dimensions and visualize for the scATAC-seq dataset (dims = 1:6). Gene activity scores were calculated using the "FeatureMatrix" function to count fragments in the 2 kb-upstream regions of genes and gene bodies. Cell clustering was performed using "FindNeighbors" and "FindClusters" functions in Seurat. Differentially accessible regions were identified using the "FindMarkers" function in Seurat with parameters of min.pct = 0.2 and test.use = 'LR'. The per-cell motif activity score was computed by chromVAR ("RunChromVAR" function in Signac). Finally, TF motif enrichment of differentially accessible regions was performed by using the "FindMotifs" function.

Integrated analysis of scRNA-seq and scATAC-seq dataset
scATAC-seq and scRNA-seq pairs were matched by Seurat's canonical correlation analysis (CCA, dims = 1:30) and scRNA-seq cell-type annotations information were transferred to scATAC-seq. As previously described [52], we identified peak-to-gene links based on the null trans correlations. Differential peak-to-gene linkages were For the construction of the transcription factor (TF)-target gene network, we first identified the TFs and marker genes that were highly expressed in the same group of cells. Next, we identified TF-target gene pairs, that is, if the marker gene was linked with peaks that matched the corresponding TF motif. For a given marker gene with at least one linked and matched peak, we summed their squared correlation R² as the linkage score for the TF-target pair. NetworkD3 (v0.4) (https://CRAN.R-project.org/package=networkD3) was used to visualize the TFtarget gene network. For subtype analysis of each individual lineage, we reconstructed the TF regulatory network using data from the corresponding lineage.

Analysis of gained and lost peaks
We identified the accessible peaks that (peak read count was > 0) in each stage (the percentage of cells with accessible peaks was > 0.25). The gained peaks at a particular stage were defined as the accessible peaks nonoverlapping with a previous stage. The lost peaks at a particular stage were defined as the peaks non-existing with this stage compared to a previous stage. Gained and lost peak-to-gene linkages were visualized by ComplexHeatmap [52]. GO analysis of the corresponding genes was performed using clusterProfiler [54].

Identification of genes involving in EPI and TE lineage specification
To study the genes with increasing expression differences during lineage differentiation, we identified the differentially expressed genes in EPI and TE sub-types using the "FindMarkers" function in Seurat and selected genes with increasing log fold change values and -log P values. EPI and TE up-regulated genes were classified based on the log fold change values.

Identification of transcription factors that regulates lineage specification
TF motif enrichment analysis was performed for genes that were upregulated in the EPI and TE. The P values of EPI-upregulated genes were divided by the P values of TE-upregulated genes and subsequently log10-transformed. TFs were classified into EPI and TE groups based on positive and negative of log10 P value fold changes. TFs with absolute values of log10 P value fold changes greater > 1 and corresponding P values < 0.05 were retained.
To examine the importance of TFs in the corresponding group regulatory network, we calculated the degree centrality, closeness centrality and eigenvector centrality of each TF in the network and the rank, respectively. The comprehensive rank was obtained by adding the ranks of the three centralities, and the higher the rank is, the more influential the TFs in the network.

Data Availability
Data and materials availability: All sequencing data were deposited at the National Center for Biotechnology Information Sequence Read Archive

Ethics Approval
All programs were approved by the Institutional Animal Care and Use Committee (IACUC) of Yunnan Key Laboratory of Primate Biomedical Research.

Competing Interests
The authors declare no competing interests.